Ambiguous POMDPs: Structural Results and Applications
نویسنده
چکیده
Markov Decision Processes (MDPs) and their generalization, Partially Observable MDPs (POMDPs), have been widely studied and used as invaluable tools in dynamic stochastic decision-making. However, two major barriers have limited their application for problems arising in various practical settings: (a) computational challenges for problems with large state or action spaces, and (b) ambiguity in transition probabilities, which are typically hard to quantify. While several solutions for the first challenge, known as “curse of dimensionality,” have been proposed, the second challenge remains unsolved and even untouched in the case of POMDPs. We refer to the second challenge as the “curse of ambiguity,” and address it by developing a generalization of POMDPs termed Ambiguous POMDPs (APOMDPs). The proposed generalization not only allows the decision maker to take into account imperfect state information, but also tackles the inevitable ambiguity with respect to the correct probabilistic model. Importantly, this paper extends various structural results from POMDPs to APOMDPs. Such structural results can guide the decision maker to make robust decisions when facing model ambiguity. Robustness is achieved by using α-maximin expected utility (α-MEU), which (a) differentiates between ambiguity and ambiguity attitude, (b) avoids the over conservativeness of traditional maximin approaches widely used in robust optimization, and (c) is found to be suitable in laboratory experiments in various choice behaviors including those in portfolio selection. The structural results provided also help to handle the “curse of dimensionality,” since they significantly simplify the search for an optimal policy. Furthermore, we provide an analytical performance guarantee for the APOMDP approach by developing a bound for its maximum reward loss due to model ambiguity. To generate further insights into how APOMDPs can help to make better decisions, we also discuss specific applications of APOMDPs including machine replacement, medical decision-making, inventory control, revenue management, optimal search, sequential design of experiments, bandit problems, and dynamic principal-agent models.
منابع مشابه
Ambiguous Partially Observable Markov Decision Processes: Structural Results and Applications
Markov Decision Processes (MDPs) and their generalization, Partially Observable MDPs (POMDPs), have been widely studied and used as invaluable tools in dynamic stochastic decision-making. However, two major barriers have limited their application for problems arising in various practical settings: (a) computational challenges for problems with large state or action spaces, and (b) ambiguity in ...
متن کاملNetworked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs
In many real-world multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction b...
متن کاملPOMDP Structural Results for Controlled Sensing
Structural results for POMDPs are important since solving POMDPs numerically are typically intractable. Solving a classical POMDP is known to be PSPACE-complete [40]. Moreover, in controlled sensing problems [16], [26], [10], it is often necessary to use POMDPs that are nonlinear in the belief state in order to model the uncertainty in the state estimate. (For example, the variance of the state...
متن کاملPlanning in Stochastic Domains: Problem Characteristics and Approximations (version Ii)
This paper is about planning in stochastic domains by means of partially observable Markov decision processes (POMDPs). POMDPs are di cult to solve and approximation is a must in real-world applications. Approximation methods can be classi ed into those that solve a POMDP directly and those that approximate a POMDP model by a simpler model. Only one previous method falls into the second categor...
متن کاملProperly Acting under Partial Observability with Action Feasibility Constraints
We introduce Action-Constrained Partially Observable Markov Decision Process (AC-POMDP), which arose from studying critical robotic applications with damaging actions. AC-POMDPs restrict the optimized policy to only apply feasible actions: each action is feasible in a subset of the state space, and the agent can observe the set of applicable actions in the current hidden state, in addition to s...
متن کامل